Context of Data

  1. The dataset consists of more than 500,000 rows and 8 columns. The company is an e-commerce gift shop;
  2. Main customer segment: wholesaler;
  3. Transaction period: 12/01/2010 to 12/09/2011.

Tasks

  1. Obtain the top 10 countries of the orders placed;
  2. Obtain the top 10 countries of the quantity sold;
  3. Obtain the top 10 countries of the total sales;
  4. Obtain the top months of the orders placed;
  5. Obtain the top months of the quantity sold;
  6. Obtain the top months of the total sales;
  7. Use RFM model to run customer behavior analysis.

Import libraries

Load Data and Understand Data

Clean Data

1. Check missing values rate for each column

  1. CustomerID has 24.93% missing rate, which is relatively high;
  2. Desciption has 0.27% missing rate, which is low and does not impact the analysis.

2. Remove 'Description' column and fill missing values in 'CustomerID' column

In the real world, we will need to verify with the related department on the missing values before any action.

3. Change the InvoiceDate format from String to DateTime, and split to 'Year', 'Month' and 'Day'

4. Add the column - Calculate the total sale price for each row

5. Drop Duplicates

6. Check Outlier Values

6.1 Check the UnitPrice that equals to or below zero

  1. There are two types of records where UnitPrice equals to or below zeros;
  2. 2510 of 2512 records are 0, which can be the promotional gift from the shop;
  3. 2 of 2512 records are -11062.06, which can be bad debts or refund.

6.2 Remove Quantity and UnitPrice with negative values

Data Analysis and Data Visualization

1. Top 10 countries of the total number of orders

2. Top 10 countries of the total quantity sold

3. Top 10 countries of the total sales;

4. Top months of the total number of orders

5. Top months with the amount of quantity sold

6. Top months of the total sales

7. Obtain the Average Transaction Value

The average transaction value for each InvoiceNo is 533.14 GBP.

8. Conduct consumer behavior analysis

9. Analyze customer behavior with RFM model

9.1 Recency

9.2 Frequency

9.3 Monetary

The set may contains outlier values, as the standard division is too large, so we limit the m_value to the mean value and draw the figure.

9.4 Customer hierarchy

9.5 Visualize customer segmentation

Results obtained from Data Analysis

1. The customer’s largest number of orders placed, quantity sold and total sales comes from United Kingdom, since it is a UK based e-commerce shop.

2. The top 10 countries of total number of orders places are as following:

  1. United Kingdom
  2. Germany
  3. France
  4. EIRE
  5. Belgium
  6. Netherlands
  7. Spain
  8. Portugal
  9. Australia
  10. Switzerland

3. The top 10 countries of total quantity sold are as following:

  1. United Kingdom
  2. Netherlands
  3. EIRE
  4. Germany
  5. France
  6. Australia
  7. Sweden
  8. Switzerland
  9. Spain
  10. Japan

4. The top 10 countries of total sales amount are as following:

  1. United Kingdom
  2. Netherlands
  3. EIRE
  4. Germany
  5. France
  6. Australia
  7. Spain
  8. Switzerland
  9. Belgium
  10. Sweden

Besides United Kingdom, Netherlands, NIRE, Germany, France, and Australia are recommended with the most attention and efforts.

5. November 2011 generated the largest amount of orders, quantities and sales, following with October and September, as the holidays are coming in the next few months. Promotions can be enhanced during these time periods.

6. The company tends to send out free gift with purchases each month, the top months of sending out free gifts are as following:

  1. November
  2. April
  3. October
  4. March
  5. December

7. The largest portion of company’s customer segments are ‘Best customers’ and ‘Potential Loyalists’, which are the major groups of overall revenues for the company and are the first priority the company needs to pay attention to. Also ‘Lost Customers’, ‘New Customers’ and ‘At Risk Customers’ are the second priority the company needs to deal with.

  1. 23.9% ‘Best Customers’: keeping customer satisfied with the productand services are the top priority. Company can also reward customers with the first experience on new products and promotions. Further analyzing on customer preferences may provide additional opportunities.

  2. 23.4% ‘Potential Loyalties’: it is important to offer membership or recommend related products to upsell them and help them become the best customers.

  3. 21.3% ‘Lost Customers’: it is a relatively large percentage of customer segments, though it is often challenging to re-engage the lost customers, the high value of these customers makes it worth while trying. It is important to communicate with them on the basis of their specific preference, analyze from earlier transaction data and send email campaigns to recall the brand name.

  4. 19.1% ‘New Customer’: it is important to start building relationships with these customers, and keep their purchases valued and satisfied, also company can send special offers to increase their visits.

  5. 10.5% ‘At Risk Customers’: it is important to bring them back by sending out relevant promotions, running surveys and providing personalized reactivation campaigns to reconnect and encourage another purchase.